Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures
نویسندگان
چکیده
Abstract Achieving maximum parallel performance on multi-core CPUs and many-core GPUs is a challenging task depending multiple factors. These include, for example, the number granularity of computations or use memories devices. In this paper, we assess those factors by evaluating comparing different parallelizations same problem multiprocessor containing CPU with 40 cores four P100 Pascal architecture. We use, as study case, convolutional operation behind non-standard finite element mesh truncation technique in context open region electromagnetic wave propagation problems. A total six algorithms implemented using OpenMP CUDA have been used to carry out comparison leveraging levels parallelism both types platforms. Three are presented first time including multi-GPU method, two others improved versions previously developed some authors. This paper presents thorough experimental evaluation radar cross-sectional prediction problem. Results show that obtained GPU clearly overcomes CPU, much more so if distribute data computations. Accelerations close 30 while version accelerations larger than 250 achieved.
منابع مشابه
Finite element assembly strategies on multi- and many-core architectures
We demonstrate that radically differing implementations of finite element methods are needed on multicore (CPU) and many-core (GPU) architectures, if their respective performance potential is to be realised. Our experimental investigations using a finite element advection-diffusion solver show that increased performance on each architecture can only be achieved by committing to specific and div...
متن کاملSolving Matrix Equations on Multi-Core and Many-Core Architectures
We address the numerical solution of Lyapunov, algebraic and differential Riccati equations, via the matrix sign function, on platforms equipped with general-purpose multicore processors and, optionally, one or more graphics processing units (GPUs). In particular, we review the solvers for these equations, as well as the underlying methods, analyze their concurrency and scalability and provide ...
متن کاملParallelizing Word2Vec in Multi-Core and Many-Core Architectures
Word2vec is a widely used algorithm for extracting low-dimensional vector representations of words. State-of-the-art algorithms including those by Mikolov et al. [5, 6] have been parallelized for multi-core CPU architectures, but are based on vector-vector operations with “Hogwild" updates that are memory-bandwidth intensive and do not efficiently use computational resources. In this paper, we ...
متن کاملPerformance analysis of a 3D unstructured mesh hydrodynamics code on multi- and many-core architectures
Several next generation high performance computing platforms are or will be based on the so-called many-core architectures, which represent a significant departure from commodity multi-core architectures. A key issue in transitioning large-scale simulation codes from multi-core to many-core systems is closing the serial performance gap, that is, overcoming the large difference in single-core pe...
متن کاملMany-Task Computing on Many-Core Architectures
Many-Task Computing (MTC) is a common scenario for multiple parallel systems, such as cluster, grids, cloud and supercomputers, but it is not so popular in shared memory parallel processors. In this sense and given the spectacular growth in performance and in number of cores integrated in many-core architectures, the study of MTC on such architectures is becoming more and more relevant. In this...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Journal of Supercomputing
سال: 2022
ISSN: ['0920-8542', '1573-0484']
DOI: https://doi.org/10.1007/s11227-022-04975-6